36 research outputs found
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
Accelerating MCMC via Parallel Predictive Prefetching
We present a general framework for accelerating a large class of widely used
Markov chain Monte Carlo (MCMC) algorithms. Our approach exploits fast,
iterative approximations to the target density to speculatively evaluate many
potential future steps of the chain in parallel. The approach can accelerate
computation of the target distribution of a Bayesian inference problem, without
compromising exactness, by exploiting subsets of data. It takes advantage of
whatever parallel resources are available, but produces results exactly
equivalent to standard serial execution. In the initial burn-in phase of chain
evaluation, it achieves speedup over serial evaluation that is close to linear
in the number of available cores
Accelerating Markov chain Monte Carlo via parallel predictive prefetching
We present a general framework for accelerating a large class of widely used Markov chain Monte Carlo (MCMC) algorithms. This dissertation demonstrates that MCMC inference can be accelerated in a model of parallel computation that uses speculation to predict and complete computational work ahead of when it is known to be useful. By exploiting fast, iterative approximations to the target density, we can speculatively evaluate many potential future steps of the chain in parallel. In Bayesian inference problems, this approach can accelerate sampling from the target distribution, without compromising exactness, by exploiting subsets of data. It takes advantage of whatever parallel resources are available, but produces results exactly equivalent to standard serial execution. In the initial burn-in phase of chain evaluation, it achieves speedup over serial evaluation that is close to linear in the number of available cores.Engineering and Applied Science
Excitability Constraints on Voltage-Gated Sodium Channels
We study how functional constraints bound and shape evolution through an analysis of mammalian voltage-gated sodium channels. The primary function of sodium channels is to allow the propagation of action potentials. Since Hodgkin and Huxley, mathematical models have suggested that sodium channel properties need to be tightly constrained for an action potential to propagate. There are nine mammalian genes encoding voltage-gated sodium channels, many of which are more than ≈90% identical by sequence. This sequence similarity presumably corresponds to similarity of function, consistent with the idea that these properties must be tightly constrained. However, the multiplicity of genes encoding sodium channels raises the question: why are there so many? We demonstrate that the simplest theoretical constraints bounding sodium channel diversity—the requirements of membrane excitability and the uniqueness of the resting potential—act directly on constraining sodium channel properties. We compare the predicted constraints with functional data on mammalian sodium channel properties collected from the literature, including 172 different sets of measurements from 40 publications, wild-type and mutant, under a variety of conditions. The data from all channel types, including mutants, obeys the excitability constraint; on the other hand, channels expressed in muscle tend to obey the constraint of a unique resting potential, while channels expressed in neuronal tissue do not. The excitability properties alone distinguish the nine sodium channels into four different groups that are consistent with phylogenetic analysis. Our calculations suggest interpretations for the functional differences between these groups
A linear and regularized ODF estimation algorithm to recover multiple fibers in Q-Ball imaging
Due the well-known limitations of diffusion tensor imaging (DTI), high angular resolution diffusion imaging is currently of great interest to characterize voxels containing multiple fiber crossings. In particular, Q-ball imaging (QBI) is now a popular reconstruction method to obtain the orientation distribution function (ODF) of these multiple fiber distributions. The latter captures all important angular contrast by expressing the probability that a water molecule will diffuse into any given solid angle. However, QBI and other high order spin displacement estimation methods involve non-trivial numerical computations and lack a straightforward regularization process. In this paper, we propose a simple linear and regularized analytic solution for the Q-ball reconstruction of the ODF. First, the signal is modeled with a physically meaningful high order spherical harmonic series by incorporating the Laplace-Beltrami operator in the solution. This leads to an elegant mathematical simplification of the Funk-Radon transform using the Funk-Hecke formula. In doing so, we obtain a fast and robust model-free ODF approximation. We validate the accuracy of the ODF estimation quantitatively using the multi-tensor synthetic model where the exact ODF can be computed. We also demonstrate that the estimated ODF can recover known multiple fiber regions in a biological phantom and in the human brain. Another important contribution of the paper is the development of ODF sharpening methods. We show that sharpening the measured ODF enhances each underlying fiber compartment and considerably improves the extraction of fibers. The proposed techniques are simple linear transformations of the ODF and can easily be computed using our spherical harmonics machinery
Apparent Diffusion Coefficients from High Angular Resolution Diffusion Images: Estimation and Applications
High angular resolution diffusion imaging (HARDI) has recently been of great interest in characterizing non-Gaussian diffusion processes. In the white matter of the brain, non-Gaussian diffusion occurs when fiber bundles cross, kiss or diverge within the same voxel. One important goal in current research is to obtain more accurate fits of the apparent diffusion processes in these multiple fiber regions, thus overcoming the limitations of classical diffusion tensor imaging (DTI). This paper presents an extensive study of high order models for apparent diffusion coefficient estimation and illustrates some of their applications. In particular, we first develop the appropriate mathematical tools to work on noisy HARDI data. Using a meaningful modified spherical harmonics basis to capture the physical constraints of the problem, we propose a new regularization algorithm to estimate a diffusivity profile smoother and closer to the true diffusivities without noise. We define a smoothing term based on the Laplace-Beltrami operator for functions defined on the unit sphere. The properties of the spherical harmonics are then exploited to derive a closed form implementation of this term into the fitting procedure. We next derive the general linear transformation between the coefficients of a spherical harmonics series of order and the independent elements of the rank- high order diffusion tensor. An additional contribution of the paper is the careful study of the state of the art anisotropy measures for high order formulation models computed from spherical harmonics or tensor coefficients. Their ability to characterize the underlying diffusion process is analyzed. We are able to reproduce published results and also able to recover voxels with isotropic, single fiber anisotropic and multiple fiber anisotropic diffusion. We test and validate the different approaches on apparent diffusion coefficients from synthetic data, from a biological phantom and from a human brain dataset
Recommended from our members
StarFlow: A Script-Centric Data Analysis Environment
We introduce StarFlow, a script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and user annotations, (2) command-line tools for exploring and propagating changes through the resulting dependency network, (3) support for workflow abstractions enabling robust parallel executions of complex analysis pipelines, and (4) a seamless interface with the Python scripting language. We describe a range of real applications of StarFlow, including automatic parallelization of complex workflows in the cloud.Engineering and Applied Science
Flash Caching on the Storage Client
Flash memory has recently become popular as a caching medium. Most uses to date are on the storage server side. We investigate a different structure: flash as a cache on the client side of a networked storage environment. We use trace-driven simulation to explore the design space. We consider a wide range of configurations and policies to determine the potential client-side caches might offer and how best to arrange them. Our results show that the flash cache writeback policy does not significantly affect performance. Write-through is sufficient; this greatly simplifies cache consistency handling. We also find that the chief benefit of the flash cache is its size, not its persistence. Cache persistence offers additional performance benefits at system restart at essentially no runtime cost. Finally, for some workloads a large flash cache allows using miniscule amounts of RAM for file caching (e.g., 256 KB) leaving more memory available for application use.Engineering and Applied Science